机器学习之-Rule Learning

本文所做数据处理为计算entropy(熵)，应用简易的Rule Learning(规则学习)算法。

所用数据为：

sailing-custom-python.tab

zoo-python.tab

1. Import 各种包

1
2
3

import pandas as pd
import numpy as np
import math

2. 用pandas包load数据

1
2
3

sailData = pd.read_table('sailing-custom-python.tab')
zooData = pd.read_table('zoo-python.tab')
zooData = zooData.drop(columns='name')

3. 计算entropy(熵)的方法

公式参考：

def entropy(data, target):
    count = pd.value_counts(data[target])
    dataSize = data[target].size
    entropyValue = 0
    for value in count:
        proportion = value/dataSize
        entropyValue -= proportion * math.log(proportion, 2)
    return entropyValue

测试方法体是否能运行

1	entropy(sailData, 'Sail')

1	entropy(zooData, 'type')

输出：

0.9975025463691153

2.390559682294039

方法正常执行

4. 计算最多数的col名，并返回

def majority_class(data, targetClass):
    counts = pd.value_counts(data[targetClass])
    max_name = counts.idxmax()
    return max_name

5. 简易规则学习方法

def simpler_rule_learner(data, target):
    while data.shape[0] > 0:
        if entropy(data, target) == 0:
            print ("otherwise =>", majority_class(data,target))
            data = data.iloc[0:0]
        else:
            best_entropy = entropy(data, target)
            best_attribute = ''
            best_value = ''
            best_data=data
    
            for attribute in data:
                for value in data[attribute]:
                    data2 = data.loc[data[attribute]==value]
                    
                    if entropy(data2, target) < best_entropy:
                        best_entropy = entropy(data2, target)
                        best_attribute = attribute
                        best_value = value
                       
                        best_data=data2
            
            print(best_attribute, "=", best_value, "=>", majority_class(best_data,target))
            data = data.loc[data[best_attribute] != best_value]

测试方法：

1	simpler_rule_learner(sailData, 'Sail')

Company = big => yes
Outlook = rainy => no
Company = med => yes
Sailboat = small => yes
otherwise => no

1	simpler_rule_learner(zooData, 'type')

feathers = Yes => bird
milk = Yes => mammal
hair = Yes => insect
airborne = Yes => insect
fins = Yes => fish
legs = 8.0 => invertebrate
eggs = No => reptile
breathes = No => invertebrate
aquatic = Yes => amphibian
predator = Yes => reptile
backbone = Yes => reptile
legs = 6.0 => insect
otherwise => invertebrate

至此简易规则学习方法已经可以正确输出结果。

注：筛选某一列中值为特定的行，方法如下（data.loc用法）

print(sailData)
print()
attribute = 'Outlook'
value = 'rainy'
print(sailData.loc[sailData[attribute]==value])

   Outlook Company Sailboat Sail
0    rainy     big      big  yes
1    rainy     big    small  yes
2    rainy     med      big   no
3    rainy     med    small   no
4    sunny     big      big  yes
5    sunny     big    small  yes
6    sunny     med      big  yes
7    sunny     med      big  yes
8    sunny     med    small  yes
9    sunny      no    small  yes
10   sunny      no      big   no
11   rainy     med      big   no
12   rainy      no      big   no
13   rainy      no      big   no
14   rainy      no    small   no
15   rainy      no    small   no
16   sunny     big      big  yes

   Outlook Company Sailboat Sail
0    rainy     big      big  yes
1    rainy     big    small  yes
2    rainy     med      big   no
3    rainy     med    small   no
11   rainy     med      big   no
12   rainy      no      big   no
13   rainy      no      big   no
14   rainy      no    small   no
15   rainy      no    small   no

以上。

Tags: machine learning

← React动态生成component自定义组件常用命令 →

赏

使用支付宝打赏

使用微信打赏

若你觉得我的文章对你有帮助，欢迎点击上方按钮对我打赏

扫描二维码，分享此文章